A new constraint for mining sets in sequences1

نویسندگان

  • Boris Cule
  • Bart Goethals
  • Céline Robardet
چکیده

Discovering interesting episodes is a popular area in temporal or sequential data mining, examples of which are mining text or protein sequences. In such data, the order in which the events appear is being analysed and the user’s goal is to identify the regularities that may appear in the dataset, consisting of one or more sequences. The usual approach to episode discovery is to look for episodes consisting of events that frequently appear close to each other. Most of the current state-of-the-art methods first use a window of fixed length to find sufficiently cohesive episodes and then retrieve those that occur in more windows (or sequences) than a given minimum threshold. The frequency of an itemset X, fr(X), is thus defined as the number of windows X appears in divided by the total number of possible windows. The use of a window of fixed length is a major limitation of such approaches as no episodes longer than this window can ever be discovered. A different method that increases the window length proportionally to the size of the candidate set has been proposed in order to remove this limitation. Still, in this proposal, the window length remains fixed for a particular candidate when counting its frequency in the sequence. Hence, when the episode occurs in the sequence, but in a time frame larger than the window size, then such occurrences will be disregarded. The high frequency of a set of events appearing close together gives no guarantee that a subset of that set will not sometimes appear far away from the rest of the set. Take, for example, the following sequence:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A new method for 3-D magnetic data inversion with physical bound

Inversion of magnetic data is an important step towards interpretation of the practical data. Smooth inversion is a common technique for the inversion of data. Physical bound constraint can improve the solution to the magnetic inverse problem. However, how to introduce the bound constraint into the inversion procedure is important. Imposing bound constraint makes the magnetic data inversion a n...

متن کامل

Convex Generalized Semi-Infinite Programming Problems with Constraint Sets: Necessary Conditions

 We consider generalized semi-infinite programming problems in which the index set of the inequality constraints depends on the decision vector and all emerging functions are assumed to be convex. Considering a lower level constraint qualification, we derive a formula for estimating the subdifferential of the value function. Finally, we establish the Fritz-John necessary optimality con...

متن کامل

Constraint-Based Pattern Set Mining

Local pattern mining algorithms generate sets of patterns, which are typically not directly useful and have to be further processed before actual application or interpretation. Rather than investigating each pattern individually at the local level, we propose to mine for global models directly. A global model is essentially a pattern set that is interpreted as a disjunction of these patterns. I...

متن کامل

Constraint-Based Mining of Formal Concepts in Transactional Data

We are designing new data mining techniques on boolean contexts to identify a priori interesting concepts, i.e., closed sets of objects (or transactions) and associated closed sets of attributes (or items). We propose a new algorithm D-Miner for mining concepts under constraints. We provide an experimental comparison with previous algorithms and an application to an original microarray dataset ...

متن کامل

High Fuzzy Utility Based Frequent Patterns Mining Approach for Mobile Web Services Sequences

Nowadays high fuzzy utility based pattern mining is an emerging topic in data mining. It refers to discover all patterns having a high utility meeting a user-specified minimum high utility threshold. It comprises extracting patterns which are highly accessed in mobile web service sequences. Different from the traditional fuzzy approach, high fuzzy utility mining considers not only counts of mob...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009